Long-term balancing selection for pathogen resistance maintains trans-species polymorphisms in a planktonic crustacean

Balancing selection is an evolutionary process that maintains genetic polymorphisms at selected loci and strongly reduces the likelihood of allele fixation. When allelic polymorphisms that predate speciation events are maintained independently in the resulting lineages, a pattern of trans-species polymorphisms may occur. Trans-species polymorphisms have been identified for loci related to mating systems and the MHC, but they are generally rare. Trans-species polymorphisms in disease loci are believed to be a consequence of long-term host-parasite coevolution by balancing selection, the so-called Red Queen dynamics. Here we scan the genomes of three crustaceans with a divergence of over 15 million years and identify 11 genes containing identical-by-descent trans-species polymorphisms with the same polymorphisms in all three species. Four of these genes display molecular footprints of balancing selection and have a function related to immunity. Three of them are located in or close to loci involved in resistance to a virulent bacterial pathogen, Pasteuria, with which the Daphnia host is known to coevolve. This provides rare evidence of trans-species polymorphisms for loci known to be functionally relevant in interactions with a widespread and highly specific parasite. These findings support the theory that specific antagonistic coevolution is able to maintain genetic diversity over millions of years.


3 . 4 . 5 . 6 . 7 .Supplementary Figure 8 . 9 . 10 .Supplementary Figure 11 . 15 . 16 . 17 . 18 .
Principal Component Analysis (PCA) based on 92608 unlinked SNPs describing species diversification including D. magna and D. sinensis genotypes.D. magna WE is the Western Eurasian clade of this species, EA is the East Asian clade and NA the North American clade.Source data are provided as a Source Data file.Principal Component Analysis (PCA) based on 92608 unlinked SNPs describing species diversification including D. similis and D. sinensis genotypes.Source data are provided as a Source Data file.Principal Component Analysis (PCA) based on 92608 unlinked SNPs describing species diversification including all D. magna genotypes.D. magna WE is the Western Eurasian clade of this species, EA is the East Asian clade and NA the North American clade.Source data are provided as a Source Data file.Best ML mitochondrial phylogeny including a subset of the Daphnia genotypes analyzed in this study.D. magna WE is the Western Eurasian clade of this species, EA is the East Asian clade and NA the North American clade.D. hispanica was used as an outgroup, as it is closely related to the other species.Source data are provided as a Source Data file.Dot plots (left) of nucleotide diversity (π) along contig 11F, where two putative TSPs were identified.Each dot is the average of π in a 5kb non-overlapping sliding window.The red line represents the moving average of π along the contig, while the horizontal dotted line the overall average of π in the contig.The blue and brown dashed vertical lines indicate the coordinates of the ABC locus and the F locus, respectively.The green triangles indicate the position of the two TSPs.Violin plots (right) displaying the distribution of π in the candidate regions and in the background.The P-values (two-sided Wilcoxon tests) associated with the statistical difference between the different groups are reported.Daphnia magna plots are showed at the top, D. similis plots in the middle and D. sinensis plots at the bottom of the figure.CR_1: candidate region in the surrounding of the ABC locus; BG_1: background of contig 11F relative to CR_1; CR_2: candidate region within the F locus; BG_2: background of contig 11F relative to CR_2; 20 contigs: π calculated as background in the 20 longest contigs.Source data are provided as a Source Data file.Dot plots (left) of nucleotide diversity (π) along contig 18F, where one putative TSP was identified.Each dot is the average of π in a 5kb non-overlapping sliding window.The red line represents the moving average of π along the contig, while the horizontal dotted line the overall average of π in the contig.The dashed vertical lines indicate the coordinates of the D locus.The green triangle indicates the position of the TSP.Violin plots (right) displaying the distribution of π in the candidate region and in the background.The P-values (twosided Wilcoxon tests) associated with the statistical difference between the different groups are reported.Daphnia magna plots are showed at the top, D. similis plots in the middle and D. sinensis plots at the bottom of the figure.CR: candidate region; BG: background of contig 18F; 20 contigs: π calculated as background in the 20 longest contigs.Source data are provided as a Source Data file.Dot plots (left) of nucleotide diversity (π) along contig 28F, where two putative TSPs were identified.Each dot is the average of π in a 5kb non-overlapping sliding window.The red line represents the moving average of π along the contig, while the horizontal dotted line the overall average of π in the contig.The green triangles indicate the position of the TSPs.Violin plots (right) displaying the distribution of π in the candidate region and in the background.The P-values (two-sided Wilcoxon tests) associated with the statistical difference between the different groups are reported.Daphnia magna plots are showed at the top, D. similis plots in the middle and D. sinensis plots at the bottom of the figure.CR: candidate region; BG: background of contig 28F; 20 contigs: π calculated as background in the 20 longest contigs.Source data are provided as a Source Data file.Dot plots (left) of Tajima's D along contig 11F, where two putative TSPs were identified.Each dot is the average of Tajima's D in a 5kb non-overlapping sliding window.The red line represents the moving average of Tajima's D along the contig, while the horizontal dotted line the overall average of Tajima's D in the contig.The blue and brown dashed vertical lines indicate the coordinates of the ABC locus and the F locus, respectively.The green triangles indicate the position of the two TSPs.Violin plots (right) displaying the distribution of Tajima's D in the candidate regions and in the background.The P-values (two-sided Wilcoxon tests) associated with the statistical difference between the different groups are reported.Daphnia magna plots are showed at the top, D. similis plots in the middle and D. sinensis plots at the bottom of the figure.CR_1: candidate region in the surrounding of the ABC locus; BG_1: background of contig 11F relative to CR_1; CR_2: candidate region within the F locus; BG_2: background of contig 11F relative to CR_2; 20 contigs: Tajima's D calculated as background in the 20 longest contigs.Source data are provided as a Source Data file.Dot plots (left) of Tajima's D along contig 18F, where one putative TSP was identified.Each dot is the average of Tajima's D in a 5kb non-overlapping sliding window.The red line represents the moving average of Tajima's D along the contig, while the horizontal dotted line the overall average of Tajima's D in the contig.The dashed vertical lines indicate the coordinates of the D locus.The green triangle indicates the position of the TSP.Violin plots (right) displaying the distribution of Tajima's D in the candidate region and in the background.The P-values (two-sided Wilcoxon tests) associated with the statistical difference between the different groups are reported.Daphnia magna plots are showed at the top, D. similis plots in the middle and D. sinensis plots at the bottom of the figure.CR: candidate region; BG: background of contig 18F; 20 contigs: Tajima's D calculated as background in the 20 longest contigs.Source data are provided as a Source Data file.contig 18F (bp) position along the contig 18F (bp) Supplementary Figure 12.Dot plots (left) of Tajima's D along contig 28F, where two putative TSPs were identified (only 1 remained in the final set of candidates).Each dot is the average of Tajima's D in a 5kb non-overlapping sliding window.The red line represents the moving average of Tajima's D along the contig, while the horizontal dotted line the overall average of Tajima's D in the contig.The green triangles indicate the position of the TSPs.Violin plots (right) displaying the distribution of Tajima's D in the candidate region and in the background.The P-values (two-sided Wilcoxon tests) associated with the statistical difference between the different groups are reported.Daphnia magna plots are showed at the top, D. similis plots in the middle and D. sinensis plots at the bottom of the figure.CR: candidate region; BG: background of contig 28F; 20 contigs: Tajima's D calculated as background in the 20 longest contigs.Source data are provided as a Source Data file.along the contig 28F (bp) position along the contig 28F (bp) Supplementary Figure 13.Dot plots (left) of Fst along contig 11F, where two putative TSPs were identified.Each dot is the average of Fst in a 5kb non-overlapping sliding window.The red line represents the moving average of Fst along the contig, while the horizontal dotted line the overall average of Fst in the contig.The blue and brown dashed vertical lines indicate the coordinates of the ABC locus and the F locus, respectively.The green triangles indicate the position of the two TSPs.Violin plots (right) displaying the distribution of Fst in the candidate regions and in the background.The P-values (two-sided Wilcoxon tests) associated with the statistical difference between the different groups are reported.Fst between D. magna and D. similis plots are showed at the top, Fst between D. similis and D. sinensis plots in the middle and Fst between D. magna and D. sinensis plots at the bottom of the figure.CR_1: candidate region in the surrounding of the ABC locus; BG_1: background of contig 11F relative to CR_1; CR_2: candidate region within the F locus; BG_2: background of contig 11F relative to CR_2; 20 contigs: Fst calculated as background in the 20 longest contigs.Source data are provided as a Source Data file. .magna vs D. sinensis position along the contig 11F (bp) position along the contig 11F (bp) Dot plots (left) of Fst along contig 18F, where one putative TSP was identified.Each dot is the average of Fst in a 5kb non-overlapping sliding window.The red line represents the moving average of Fst along the contig, while the horizontal dotted line the overall average of Fst in the contig.The dashed vertical lines indicate the coordinates of the D locus.The green triangle indicates the position of the TSP.Violin plots (right) displaying the distribution of Fst in the candidate region and in the background.The P-values (two-sided Wilcoxon tests) associated with the statistical difference between the different groups are reported.Fst between D. magna and D. similis plots are showed at the top, Fst between D. similis and D. sinensis plots in the middle and Fst between D. magna and D. sinensis plots at the bottom of the figure.CR: candidate region; BG: background of contig 18F; 20 contigs: Fst calculated as background in the 20 longest contigs.Source data are provided as a Source Data file. .magna vs D. similis Fst Fst -D.similis vs D. sinensis Fst Fst -D.magna vs D. sinensis position along the contig 18F (bp) position along the contig 18F (bp) Dot plots (left) of Fst along contig 28F, where two putative TSPs were identified (only 1 remained in the final set of candidates).Each dot is the average of Fst in a 5kb non-overlapping sliding window.The red line represents the moving average of Fst along the contig, while the horizontal dotted line the overall average of Fst in the contig.The green triangles indicate the position of the TSP.Violin plots (right) displaying the distribution of Fst in the candidate region and in the background.The P-values (two-sided Wilcoxon tests) associated with the statistical difference between the different groups are reported.Fst between D. magna and D. similis plots are showed at the top, Fst between D. similis and D. sinensis plots in the middle and Fst between D. magna and D. sinensis plots at the bottom of the figure.CR: candidate region; BG: background of contig 28F; 20 contigs: Fst calculated as background in the 20 longest contigs.Source data are provided as a Source Data file. .magna vs D. similis Fst Fst -D.similis vs D. sinensis Fst Fst -D.magna vs D. sinensis position along the contig 28F (bp)position along the contig 28F (bp) Dot plots (left) of beta scores (using -maf 0.05) along contig 11F, where two putative TSPs were identified.Each dot is the average of beta in a 5kb non-overlapping sliding window.The red line represents the moving average of beta along the contig, while the horizontal dotted line the overall average of beta in the contig.The blue and brown dashed vertical lines indicate the coordinates of the ABC locus and the F locus, respectively.The green triangles indicate the position of the two TSPs.Violin plots (right) displaying the distribution of beta in the candidate regions and in the background.The P-values (two-sided Wilcoxon tests) associated with the statistical difference between the different groups are reported.Daphnia magna plots are showed at the top, D. similis plots in the middle and D. sinensis plots at the bottom of the figure.CR_1: candidate region in the surrounding of the ABC locus; BG_1: background of contig 11F relative to CR_1; CR_2: candidate region within the F locus; BG_2: background of contig 11F relative to CR_2; 20 contigs: beta calculated as background in the 20 longest contigs.Source data are provided as a Source Data file.Dot plots (left) of beta scores (using -maf 0.05) along contig 18F, where one putative TSP was identified.Each dot is the average of beta in a 5kb non-overlapping sliding window.The red line represents the moving average of beta along the contig, while the horizontal dotted line the overall average of beta in the contig.The dashed vertical lines indicate the coordinates of the D locus.The green triangle indicates the position of the TSP.Violin plots (right) displaying the distribution of beta in the candidate region and in the background.The P-values (two-sided Wilcoxon tests) associated with the statistical difference between the different groups are reported.Daphnia magna plots are showed at the top, D. similis plots in the middle and D. sinensis plots at the bottom of the figure.CR: candidate region; BG: background of contig 18F; 20 contigs: beta calculated as background in the 20 longest contigs.Source data are provided as a Source Data file.Dot plots (left) of beta scores (using -maf 0.05) along contig 28F, where two putative TSPs were identified (only 1 remained in the fianl set of candidates).Each dot is the average of beta in a 5kb non-overlapping sliding window.The red line represents the moving average of beta along the contig, while the horizontal dotted line the overall average of beta in the contig.The green triangles indicate the position of the TSPs.Violin plots (right) displaying the distribution of beta in the candidate region and in the background.The P-values (two-sided Wilcoxon tests) associated with the statistical difference between the different groups are reported.Daphnia magna plots are showed at the top, D. similis plots in the middle and D. sinensis plots at the bottom of the figure.CR: candidate region; BG: background of contig 28F; 20 contigs: beta calculated as background in the 20 longest contigs.Source data are provided as a Source Data file.positionalong the contig 28F (bp) position along the contig 28F (bp) Principal Component Analysis (PCA) based on 92608 unlinked SNPs describing species diversification including D. magna and D. similis genotypes.D. magna WE is the Western Eurasian clade of this species, EA is the East Asian clade and NA the North American clade.Source data are provided as a Source Data file.

Table 2 .
Resume of Wilcoxon rank sum tests that compared nucleotide diversity calculated in 5kb windows in the six candidate regions against the rest of the contig where the region lies and against the 20 longest contigs in D. magna genome not containing candidate regions.In bold P-values ≤ 0.05.Contig refers to the contig of D. magna reference genome 3.0(Fields et al. in prep.).All tests are one-sided Wilcoxon tests.

Table 3 .
Resume of Wilcoxon rank sum tests that compared nucleotide diversity calculated in 2.5kb windows in the six candidate regions against the rest of the contig where the region lies and against the 20 longest contigs in D. magna genome not containing candidate regions.In bold P-values ≤ 0.05.Contig refers to the contig of D. magna reference genome 3.0(Fields et al. in prep.).All tests are one-sided Wilcoxon tests.

Table 4 .
Resume of Wilcoxon rank sum tests that compared nucleotide diversity calculated in 1kb windows in the six candidate regions against the rest of the contig where the region lies and against the 20 longest contigs in D. magna genome not containing candidate regions.In bold P-values ≤ 0.05.Contig refers to the contig of D. magna reference genome 3.0(Fields et al. in prep.).All tests are one-sided Wilcoxon tests.

Table 5 .
Resume of Wilcoxon rank sum tests that compared Tajima's D calculated in 5kb windows in the six candidate regions against the rest of the contig where the region lies and against the 20 longest contigs in D. magna genome not containing candidate regions.In bold P-values ≤ 0.05.Contig refers to the contig of D. magna reference genome 3.0(Fields et al. in prep.).All tests are one-sided Wilcoxon tests.

Table 6 .
Resume of Wilcoxon rank sum tests that compared Tajima's D calculated in 2.5kb windows in the six candidate regions against the rest of the contig where the region lies and against the 20 longest contigs in D. magna genome not containing candidate regions.In bold P-values ≤ 0.05.Contig refers to the contig of D. magna reference genome 3.0(Fields et al. in prep.).All tests are one-sided Wilcoxon tests.

Table 7 .
Resume of Wilcoxon rank sum tests that compared Tajima's D calculated in 1kb windows in the six candidate regions against the rest of the contig where the region lies and against the 20 longest contigs in D. magna genome not containing candidate regions.In bold P-values ≤ 0.05.Contig refers to the contig of D. magna reference genome 3.0(Fields et al. in prep.).All tests are one-sided Wilcoxon tests.

Table 8 .
Resume of Wilcoxon rank sum tests that compared Fst calculated in 5kb windows in the six candidate regions against the rest of the contig where the region lies and against the 20 longest contigs in D. magna genome not containing candidate regions.In bold Pvalues ≤ 0.05.Contig refers to the contig of D. magna reference genome 3.0(Fields et al. in prep.).All tests are one-sided Wilcoxon tests.

Table 9 .
Resume of Wilcoxon rank sum tests that compared Fst calculated in 2.5kb windows in the six candidate regions against the rest of the contig where the region lies and against the 20 longest contigs in D. magna genome not containing candidate regions.In bold Pvalues ≤ 0.05.Contig refers to the contig of D. magna reference genome 3.0(Fields et al. in prep.).All tests are one-sided Wilcoxon tests.

Table 10 .
Resume of Wilcoxon rank sum tests that compared Fst calculated in 1kb windows in the six candidate regions against the rest of the contig where the region lies and against the 20 longest contigs in D. magna genome not containing candidate regions.In bold Pvalues ≤ 0.05.Contig refers to the contig of D. magna reference genome 3.0(Fields et al. in prep.).All tests are one-sided Wilcoxon tests.

Table 11 .
Resume of Wilcoxon rank sum tests that compared beta scores (-maf 0.05) calculated in 5kb windows in the six candidate regions against the rest of the contig where the region lies and against the 20 longest contigs in D. magna genome not containing candidate regions.In bold P-values ≤ 0.05.Contig refers to the contig of D. magna reference genome 3.0(Fields et al. in prep.).All tests are one-sided Wilcoxon tests.

20E-06 8.40E-10 SupplementaryTable 12 .
Resume of Wilcoxon rank sum tests that compared beta scores (-maf 0.1) calculated in 5kb windows in the six candidate regions against the rest of the contig where the region lies and against the 20 longest contigs in D. magna genome not containing candidate regions.In bold P-values ≤ 0.05.Contig refers to the contig of D. magna reference genome 3.0(Fields et al. in prep.).All tests are one-sided Wilcoxon tests.

60E-05 3.50E-09 SupplementaryTable 13 .
Resume of Wilcoxon rank sum tests that compared beta scores (-maf 0.17) calculated in 5kb windows in the six candidate regions against the rest of the contig where the region lies and against the 20 longest contigs in D. magna genome not containing candidate regions.In bold P-values ≤ 0.05.Contig refers to the contig of D. magna reference genome 3.0(Fields et al. in prep.).All tests are one-sided Wilcoxon tests.

60E-06 3.40E-10 SupplementaryTable 14 .
Resume of Wilcoxon rank sum tests that compared beta scores (-maf 0.05) calculated in 2.5kb windows in the six candidate regions against the rest of the contig where the region lies and against the 20 longest contigs in D. magna genome not containing candidate regions.In bold P-values ≤ 0.05.Contig refers to the contig of D. magna reference genome 3.0(Fields et al. in prep.).All tests are one-sided Wilcoxon tests.

40E-05 2.40E-09 SupplementaryTable 15 .
Resume of Wilcoxon rank sum tests that compared beta scores (-maf 0.1) calculated in 2.5kb windows in the six candidate regions against the rest of the contig where the region lies and against the 20 longest contigs in D. magna genome not containing candidate regions.In bold P-values ≤ 0.05.Contig refers to the contig of D. magna reference genome 3.0(Fields et al. in prep.).All tests are one-sided Wilcoxon tests.

70E-05 1.20E-11 SupplementaryTable 16 .
Resume of Wilcoxon rank sum tests that compared beta scores (-maf 0.17) calculated in 2.5kb windows in the six candidate regions against the rest of the contig where the region lies and against the 20 longest contigs in D. magna genome not containing candidate regions.In bold P-values ≤ 0.05.Contig refers to the contig of D. magna reference genome 3.0(Fields et al. in prep.).All tests are one-sided Wilcoxon tests.